feat(agent): classify tool-call failures + surface on web event — rework of #4413 (backend) by M3gA-Mind · Pull Request #4445 · tinyhumansai/openhuman

M3gA-Mind · 2026-07-03T15:02:22Z

Summary

Rework of #4413's backend onto post-#4399 main. Classifies a failed tool call into a user-facing explanation (plain-language cause + what to do next) and surfaces it on the tool_result socket event, so the chat "View processing" timeline can render "why / next" instead of a raw error.

Why a rework

#4399 (the TinyAgents harness migration) deleted the original classification hook (session/agent_tool_exec.rs) and the ProgressReporter trait, and moved the tool loop onto the tinyagents crate. So the hook is re-homed on the adapter seam.

Changes

New src/openhuman/tool_status/ — classify(error_text, timed_out) + describe(class) → ClassifiedFailure (class, category, cause, next action). Pure, 16 unit tests.
failure field on AgentProgress::ToolCallCompleted and WebChannelEvent.
Core hook + latent-bug fix: the crate's AgentEvent::ToolCompleted carries only call_id + tool_name, so the bridge previously hardcoded success: true — a failed tool never surfaced. ToolOutcomeCaptureMiddleware::after_tool now classifies each result and writes a call_id → (success, failure) side-channel (mirroring the existing ToolNameMap); the bridge reads it to emit real success + failure (fallback (true, None) if absent).
progress_bridge serializes failure onto the socket event + run ledger.

Scope note — backend only

The frontend consumer (timeline render) + i18n keys are a companion. They rebase cleanly onto current main from #4413 (only the Rust hook conflicted with #4399); taking the whole i18n files here would revert unrelated locale drift on main, so they're intentionally excluded. See #4413.

Validation

16 tool_status unit tests pass; lib + test build green (with the fix(orchestration): repair TinyAgents-migration build break (main does not compile) #4442 build-fix applied).
Runtime validation still needed (can't run here; needs a running app + fix(orchestration): repair TinyAgents-migration build break (main does not compile) #4442 merged): drive a deliberately-failing tool and confirm success: false + a failure object on the tool_result event. The middleware-before-event ordering that fills the side-channel is crate-internal; the fallback keeps behavior no worse than today if that assumption is ever wrong.

Summary by CodeRabbit

New Features
- Tool failures now include structured, human-readable classification details when available.
- Tool result and chat events can now carry optional failure information for downstream display.
- Improved handling of tool completion events to preserve failure context across the app.
Bug Fixes
- Better detection of common tool error cases such as timeouts, blocked actions, missing permissions, and unavailable services.
- Event processing is now more tolerant of additional tool completion data.

coderabbitai · 2026-07-03T15:02:56Z

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 22 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3bfda4af-b218-4dfb-86c2-2c59da138c51

📥 Commits

Reviewing files that changed from the base of the PR and between 96b4414 and 72cf9d2.

📒 Files selected for processing (17)

src/bin/harness_subagent_audit.rs
src/core/socketio.rs
src/openhuman/agent/harness/session/tool_progress.rs
src/openhuman/agent/progress.rs
src/openhuman/agent/progress_tracing/tests.rs
src/openhuman/channels/proactive.rs
src/openhuman/channels/providers/presentation.rs
src/openhuman/channels/providers/web/progress_bridge.rs
src/openhuman/mod.rs
src/openhuman/threads/turn_state/mirror_tests.rs
src/openhuman/tinyagents/middleware.rs
src/openhuman/tinyagents/mod.rs
src/openhuman/tinyagents/observability.rs
src/openhuman/tool_status/mod.rs
src/openhuman/tool_status/ops.rs
src/openhuman/tool_status/types.rs
tests/memory_threads_raw_coverage_e2e.rs

📝 Walkthrough

Walkthrough

Introduces a new tool_status module with failure classification types and logic, adds an optional failure field to AgentProgress::ToolCallCompleted and WebChannelEvent, threads a ToolFailureMap side-channel through TinyAgents middleware and the observability bridge, and forwards classified failures into web-channel and ledger events.

Changes

Tool failure classification pipeline

Layer / File(s)	Summary
Tool status classification types and logic `src/openhuman/tool_status/*`, `src/openhuman/mod.rs`	New `ToolLifecycleState`, `ToolFailureClass`, `FailureCategory`, `ClassifiedFailure` types and `classify`/`describe` functions are added and exposed via a new public `tool_status` module with unit tests.
Progress event and WebChannelEvent failure field `src/openhuman/agent/progress.rs`, `src/core/socketio.rs`	`AgentProgress::ToolCallCompleted` and `WebChannelEvent` gain an optional `failure` field, populated only on failure and omitted when absent.
Middleware captures failure classification `src/openhuman/tinyagents/middleware.rs`	`ToolOutcomeCaptureMiddleware` now classifies tool errors (policy-blocked, timeout, or generic) and records success/failure per `call_id` into a new `failure_map`.
Observability bridge and harness wiring of failure_map `src/openhuman/tinyagents/observability.rs`, `src/openhuman/tinyagents/mod.rs`	A shared `ToolFailureMap` is threaded through `OpenhumanEventBridge` and `AssembledTurnHarness`, and used to derive success/failure when projecting `ToolCallCompleted` events instead of always reporting success.
Web progress bridge forwards failure to socket events `src/openhuman/channels/providers/web/progress_bridge.rs`	Serializes the classified failure and forwards it into ledger `tool_call_completed` payloads and emitted `tool_result` web events.
Downstream event constructors updated for new failure field `src/bin/harness_subagent_audit.rs`, `src/openhuman/agent/harness/session/tool_progress.rs`, `src/openhuman/agent/progress_tracing/tests.rs`, `src/openhuman/channels/proactive.rs`, `src/openhuman/channels/providers/presentation.rs`, `src/openhuman/threads/turn_state/mirror_tests.rs`	Existing event construction sites and tests are updated to set `failure: None` or ignore the new field for the updated struct shapes.

Estimated code review effort: 3 (Moderate) | ~30 minutes

Sequence Diagram(s)

sequenceDiagram
    participant Middleware as ToolOutcomeCaptureMiddleware
    participant FailureMap as ToolFailureMap
    participant Bridge as OpenhumanEventBridge
    participant ProgressBridge as spawn_progress_bridge
    participant Socket as WebChannelEvent

    Middleware->>Middleware: after_tool detects result.error
    Middleware->>Middleware: classify error via tool_status::classify/describe
    Middleware->>FailureMap: insert(call_id, (success, failure))
    Bridge->>FailureMap: remove(call_id) on ToolCompleted
    FailureMap-->>Bridge: (success, failure) or absent
    Bridge->>Bridge: build AgentProgress::ToolCallCompleted{success, failure}
    Bridge->>ProgressBridge: emit AgentProgress event
    ProgressBridge->>ProgressBridge: serde_json::to_value(failure) -> failure_json
    ProgressBridge->>Socket: emit tool_result{failure: failure_json}

Suggested labels: rust-core, bug

Poem

A rabbit hopped through failed tool calls,
Classifying why each error falls. 🐰
Timeout, policy, or missing key,
Now wrapped up nice for all to see.
Through map and bridge the failure flows,
Straight to the socket, and off it goes! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR does not address `#4442`’s build-fix items; it adds tool-failure classification instead of restoring SqlRunLedgerCheckpointer, graph_fn, or the doc link.	Update the PR to fix the TinyAgents migration build break by repointing to SqliteCheckpointer, wrapping graph_fn in Some(...), and correcting the stale doc link.
Out of Scope Changes check	⚠️ Warning	Most changes are unrelated to `#4442`, adding new tool-status types, middleware, and web-event failure payloads instead of the requested mechanical build fix.	Trim the diff to the orchestration compile fix or split the failure-classification work into a separate PR.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is concise and accurately summarizes the main change: classifying tool-call failures and surfacing them on web events.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{Comment @coderabbitai help to get the list of available commands.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96b4414dcb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T15:08:46Z

+            "403",
+            "forbidden",


Don't map external 403s to policy blocks

When an integration or API-backed tool returns a normal authorization failure such as 403 insufficient authentication scopes (this exact Gmail/Composio case is exercised in tests/composio_raw_coverage_e2e.rs), these broad needles classify it as BlockedByPolicy, so the UI tells the user to change Agent access settings and marks it non-recoverable instead of prompting them to reconnect/grant scopes. Please reserve BlockedByPolicy for OpenHuman policy-specific markers/text and classify bare HTTP 403/Forbidden as credentials or user-permission related.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/openhuman/tinyagents/middleware.rs (1)
1399-1409: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Timeout detection via string match on "timed out" is fragile.

Coupling classification to a literal substring in content will silently stop matching if the message that produces it is ever reworded elsewhere in the codebase. A structured timed_out: bool carried on the tool result (or a dedicated error variant) would be more robust than text sniffing, though the current heuristic is explicitly called out as intentional in the comment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tinyagents/middleware.rs` around lines 1399 - 1409, The timeout
classification in the middleware is currently inferred by matching the literal
substring "timed out" in result.content, which is brittle. Update the logic in
tinyagents::middleware::classify to use a structured timeout indicator from the
tool result (for example a timed_out flag or dedicated error variant) instead of
text sniffing, while keeping the existing POLICY_BLOCKED_MARKER handling intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/tool_status/ops.rs`:
- Around line 60-75: The numeric status-code needles in the ToolFailureClass
detection logic can false-positive because contains_any does plain substring
matching; update the matching in the relevant classification branches (including
the BlockedByPolicy, BadCredentials, and ServiceUnavailable checks in the ops.rs
failure classifier) so codes like 403, 401, 502, 503, and 504 only match when
not surrounded by other digits. Use a boundary-aware check or equivalent guard
before classifying, while keeping the existing keyword-based needles as-is.

---

Nitpick comments:
In `@src/openhuman/tinyagents/middleware.rs`:
- Around line 1399-1409: The timeout classification in the middleware is
currently inferred by matching the literal substring "timed out" in
result.content, which is brittle. Update the logic in
tinyagents::middleware::classify to use a structured timeout indicator from the
tool result (for example a timed_out flag or dedicated error variant) instead of
text sniffing, while keeping the existing POLICY_BLOCKED_MARKER handling intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf1c9617-ea91-4bb0-8793-d408a4150acc

📥 Commits

Reviewing files that changed from the base of the PR and between 8aab529 and 96b4414.

📒 Files selected for processing (16)

src/bin/harness_subagent_audit.rs
src/core/socketio.rs
src/openhuman/agent/harness/session/tool_progress.rs
src/openhuman/agent/progress.rs
src/openhuman/agent/progress_tracing/tests.rs
src/openhuman/channels/proactive.rs
src/openhuman/channels/providers/presentation.rs
src/openhuman/channels/providers/web/progress_bridge.rs
src/openhuman/mod.rs
src/openhuman/threads/turn_state/mirror_tests.rs
src/openhuman/tinyagents/middleware.rs
src/openhuman/tinyagents/mod.rs
src/openhuman/tinyagents/observability.rs
src/openhuman/tool_status/mod.rs
src/openhuman/tool_status/ops.rs
src/openhuman/tool_status/types.rs

…ent (tinyhumansai#4254, backend) Rework of tinyhumansai#4413's backend onto post-tinyhumansai#4399 code. tinyhumansai#4399 deleted the original classification hook (session/agent_tool_exec.rs) and the ProgressReporter trait; the tool loop now runs on the tinyagents crate, so the hook is re-homed on the adapter seam. - New `src/openhuman/tool_status/` module: `classify(error_text, timed_out)` and `describe(class)` map a tool outcome to a `ClassifiedFailure` (class + category + plain-language cause + next action). Pure, unit-tested (16 tests: all classes, timeout/unknown/precedence, category mapping, recoverability). - `AgentProgress::ToolCallCompleted` + `WebChannelEvent` gain a `failure` field carrying that classification to the chat "View processing" timeline. - Core hook + latent-bug fix: the crate's `AgentEvent::ToolCompleted` carries only call_id + tool_name, so the bridge previously hardcoded `success: true` (a failed tool never surfaced). `ToolOutcomeCaptureMiddleware::after_tool` now classifies each result and writes a `call_id → (success, failure)` side-channel (mirroring the existing `ToolNameMap`); the bridge reads it to emit real `success` + `failure`. Absent entry falls back to `(true, None)`. - `progress_bridge` serializes `failure` onto the `tool_result` socket event and the run ledger. Backend only. The frontend consumer (chat timeline "why / next" render) and i18n keys are a companion — they rebase cleanly onto current `main` from tinyhumansai#4413 (only the Rust hook conflicted with tinyhumansai#4399), so they are not re-included here to avoid reverting unrelated locale drift. NOTE: the middleware-before-event ordering that populates the side-channel is crate-internal; the fallback keeps behavior no worse than before if the event is ever projected first. Needs runtime validation once tinyhumansai#4442 lands (drive a deliberately-failing tool and confirm `success:false` + `failure` on the wire).

…tatus needles Addresses CodeRabbit review on tinyhumansai#4445: - Bare HTTP 403/Forbidden (e.g. Gmail/Composio "403 insufficient authentication scopes") was classified as BlockedByPolicy, telling the user to change Agent-access settings and marking it non-recoverable. It's an external authz failure → now BadCredentials (reconnect / grant scopes). OpenHuman's own policy blocks are tagged upstream with POLICY_BLOCKED_MARKER (never reach this heuristic); the OpenHuman-specific 'forbidden path' marker stays policy, checked before credentials so it still wins. - contains_any did unbounded substring matching, so '403'/'401'/'50x' false-matched inside longer digit runs (a '403' in '14033', a port, a byte count, a timestamp). New contains_code() requires non-digit boundaries on each side. - Regression tests: external 403/401 → credentials, 'forbidden path' → policy, embedded 403/503 in longer numbers → Unknown, standalone 503 → service.

…ed in tinyhumansai#4445 tinyhumansai#4413 originally shipped both the tool-failure classification backend and the chat "View processing" render. The backend conflicted with tinyhumansai#4399 (harness migration) and has been reworked separately in tinyhumansai#4445 (tool_status module + ToolOutcomeCaptureMiddleware + WebChannelEvent.failure). This reduces tinyhumansai#4413 to just the frontend that consumes tinyhumansai#4445's `failure` field off the tool_result socket event, rebased cleanly onto current main. Kept (21 files): - chat timeline render: ProcessingTranscriptView (+test), ChatRuntimeProvider, chatService, chatRuntimeSlice (+toolFailure test), turnState. - i18n: conversations.toolFailure.* keys across all 14 locales (3-way unioned onto current main so no existing translations are reverted). Dropped (now owned by tinyhumansai#4445): tool_status/, agent progress/observability, middleware, event_bus, socketio failure plumbing, and the deleted-file conflict on agent_tool_exec.rs. Verified: pnpm typecheck clean, i18n:check parity 0 missing/0 extra, 13 frontend tests pass (chatRuntimeSlice.toolFailure + ProcessingTranscriptView). Stacked on tinyhumansai#4445 (merge that first — it emits the field this renders).

The tool-failure feature added AgentProgress::ToolCallCompleted.failure; an integration test (tests/, a separate compilation unit not covered by `cargo check --lib`) constructed the event without it, failing the Rust Core Coverage job (E0063). `cargo check --tests` is now clean.

…om, terminal-inference halt, autonomous iter-cap lift (tinyhumansai#4463) Restores loop-guard behaviors dropped with the legacy `tool_loop.rs` in the TinyAgents migration; the crate `no_progress` ladder tracks *failures* only and does not replace them. All fixes are seam-side (`src/openhuman/**`); vendored crate untouched. - RepeatProgressMiddleware (new, `after_model` + batch-aware `after_tool`): halts on 3× identical *successful* `(tool, args)` batches (tinyhumansai#4088) and 4× identical outputs (tinyhumansai#4095); `wait_subagent` polling exemption ported. Shares the halt-summary slot + steering handle with the failure breaker. - RepeatedToolFailureMiddleware: differentiated headroom restored — recoverable classes (timeout/transient markers + tinyhumansai#4445 classifier: Timeout/ ServiceUnavailable/ModelConnection) get 8 identical / 12 varied instead of the crate's fixed 3/6; POLICY_DENIED_MARKER rejoins POLICY_BLOCKED_MARKER for the 2-repeat hard-reject fast-trip (tinyhumansai#4463 part 6). - Terminal delegated-inference fast-halt (tinyhumansai#3104): budget-exhausted / provider-config-rejection delegation failures halt on first occurrence, gated on the delegated-inference envelope so recoverable tool stderr isn't caught. - Autonomous iter-cap lift (part 7): `autonomous_iter_cap()` was a dead knob — wired back in via `subagent_iter_cap_with_autonomous_lift` at the subagent `effective_max_iterations()` sites, so task-dispatcher / skill subagents run up to TASK_RUN_MAX_ITERATIONS again instead of the normal per-agent cap. - Updated the adapter-inventory middleware-count assertions (15→16, 11→12). Tests deferred to the final parity test pass. Claude-Session: https://claude.ai/code/session_019j5TLsRLHsM3kqAYFyH4hR

M3gA-Mind requested a review from a team July 3, 2026 15:02

M3gA-Mind mentioned this pull request Jul 3, 2026

feat(chat): render tool-call failure explanations in View-processing timeline (#4254, frontend) #4413

Merged

coderabbitai Bot added bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jul 3, 2026

github-project-automation Bot added this to Team Openhuman Jul 3, 2026

github-project-automation Bot moved this to Todo in Team Openhuman Jul 3, 2026

chatgpt-codex-connector Bot reviewed Jul 3, 2026

View reviewed changes

coderabbitai Bot requested changes Jul 3, 2026

View reviewed changes

Comment thread src/openhuman/tool_status/ops.rs Outdated

M3gA-Mind force-pushed the fix/GH-4254-tool-failures-backend-rework branch from 96b4414 to 487be7e Compare July 3, 2026 15:15

coderabbitai Bot previously approved these changes Jul 3, 2026

View reviewed changes

M3gA-Mind dismissed coderabbitai[bot]’s stale review via 72cf9d2 July 3, 2026 16:51

senamakel merged commit b5837ed into tinyhumansai:main Jul 3, 2026
15 checks passed

github-project-automation Bot moved this from Todo to Done in Team Openhuman Jul 3, 2026

This was referenced Jul 3, 2026

Restore loop guards: repeat-call/output breakers, recoverable-failure headroom, terminal-inference halt, autonomous iter-cap lift #4463

Open

TinyAgents migration (v0.58.7 → main) parity & bug-fix tracker — harness audit findings #4470

Open

coderabbitai Bot mentioned this pull request Jul 4, 2026

TinyAgents migration parity & bug fixes (#4451–#4469) #4495

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(agent): classify tool-call failures + surface on web event — rework of #4413 (backend)#4445

feat(agent): classify tool-call failures + surface on web event — rework of #4413 (backend)#4445
senamakel merged 3 commits into
tinyhumansai:mainfrom
M3gA-Mind:fix/GH-4254-tool-failures-backend-rework

M3gA-Mind commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (2 warnings)

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

M3gA-Mind commented Jul 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why a rework

Changes

Scope note — backend only

Validation

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (2 warnings)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

M3gA-Mind commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading